OpenClaw AI framework faces supply chain attack via ClawHub, where malicious software disguised as tools spreads through community-developed 'skills'.....
Kunlun Tian Gong launches 'Skywork Desktop Edition', upgrading AI assistants from web chat to system-level proactive collaboration tools. It supports local execution, eliminating the need to upload sensitive files to the cloud, directly accessing and processing local computer files for a safer, more efficient AI collaboration experience for Windows users.....
Kunlun Wanwei launches 'Skywork Desktop' AI app, emphasizing local operation for enhanced data security, redefining intelligent desktop office solutions.....
ZTE Communications has launched the enterprise version of Co-Claw, a desktop intelligent agent. By enhancing enterprise deployment, security governance, and capability reuse, it promotes the large-scale application of AI agents. The product unifies the operating environment, replacing traditional local mini server deployments, and helps enterprises achieve intelligent transformation in office work.
Open-source self-hosted personal AI assistant, control your computer.
Auron AI turns your computer into an intelligent companion, helping you manage tasks, automate operations, and communicate naturally.
Runable is a general-purpose automation agent that can automate any digital task that humans perform on a computer.
A virtual computer assistant that can perform tasks such as searching or creating images.
Anthropic
$21
Input tokens/M
$105
Output tokens/M
200
Context Length
Google
-
Baichuan
4
Chatglm
$5
128
prithivMLmods
ActIO-UI-7B-RLVR is a 7-billion-parameter visual language model released by Uniphore, specifically designed for computer interface automation tasks. It is based on Qwen2.5-VL-7B-Instruct and optimized through supervised fine-tuning and reinforcement learning with verifiable rewards. It performs excellently in tasks such as GUI navigation, element positioning, and interaction planning, and has achieved the leading level among open-source 7B models in the WARC-Bench benchmark test.
rujutashashikanjoshi
This is an object detection model fine-tuned on a custom dataset based on the YOLOv12 Medium architecture. This model is specifically designed to efficiently and accurately detect drone targets in images or videos, providing support for computer vision applications.
Trilogix1
Fara-7B is an efficient small language model specially designed by Microsoft for computer usage scenarios. It has only 7 billion parameters and performs excellently in advanced user tasks such as web operations, competing with larger agent systems.
noctrex
Gelato-30B-A3B is a state-of-the-art (SOTA) model fine-tuned for GUI computer usage tasks, offering a quantized version to optimize deployment efficiency. This model is specifically designed to understand and process tasks related to graphical user interfaces.
microsoft
Fara-7B is a small language model developed by Microsoft Research, specifically designed for computer usage scenarios. It has only 7 billion parameters and achieves excellent performance among models of the same scale. It can perform computer interaction tasks such as web automation and multimodal understanding.
almanach
Gaperon-Young-1125-1B is a bilingual (French-English) language model with 1.5 billion parameters, developed by the ALMAnaCH team at the French National Institute for Research in Computer Science and Control (Inria Paris). The model is trained on approximately 3 trillion high-quality tokens, with a particular focus on language quality and general text generation ability rather than benchmark test optimization.
mlfoundations
Gelato-30B-A3B is a state-of-the-art foundation model for GUI computer usage tasks. It is trained on the Click-100k dataset and outperforms previous specialized computer foundation models and larger vision-language models in multiple benchmark tests.
xlangai
OpenCUA is an end-to-end computer usage foundation model series, built on the Qwen2.5-VL instruction model, capable of generating executable operations in a computer environment. It has powerful visual positioning and multi-step task planning capabilities, and performs excellently in computer usage agent benchmark tests such as OSWorld.
timm
This is a vision Transformer model based on the DINOv3 framework, trained on the LVD-1689M dataset from the DINOv3 ViT-7B model through knowledge distillation technology. This model is specifically designed for image feature encoding and can efficiently extract image feature representations, suitable for various computer vision tasks.
This is a vision Transformer model based on the DINOv3 architecture, using a small configuration and trained through knowledge distillation on the LVD-1689M dataset. This model is specifically designed for efficient image feature extraction and supports various computer vision tasks such as image classification, feature map extraction, and image embedding.
Piero2411
This is a computer vision model based on the YOLOv8s architecture, specifically designed for barcode and QR code detection. The model has been fine-tuned on a comprehensive dataset containing more than 5000 images, supporting accurate detection and classification of multiple barcode types (such as EAN13, Code128, etc.) and QR codes.
macpaw-research
This is a computer vision model fine-tuned based on Ultralytics/YOLO11, specifically designed to detect UI elements in macOS application screenshots. It is part of the Screen2AX project, dedicated to generating accessibility metadata using computer vision technology.
logasanjeev
A powerful computer vision tool capable of classifying, detecting, and extracting text from Indian ID card documents.
lmstudio-community
An image-text to text generation model based on the Transformer architecture, designed specifically for computer/GUI-related scenarios, with intelligent agent capabilities.
Zeta-LLM
Zeta 2 is a small language model (SLM) with approximately 460 million parameters, meticulously crafted on consumer-grade computers and supports multiple languages.
Kar1hik
This model is fine-tuned based on the DINOv2 architecture for disease classification of skin lesion images
onnx-community
This is the ONNX format version of the facebook/dinov2-base model, suitable for computer vision tasks.
nvidia
The first hybrid computer vision model combining the strengths of Mamba and Transformer, enhancing visual feature modeling efficiency by reconstructing the Mamba formula, and introducing self-attention modules in the final layers of the Mamba architecture to improve long-range spatial dependency modeling.
MambaVision is the first hybrid computer vision model combining the strengths of Mamba and Transformer. It enhances visual feature modeling by redesigning the Mamba formulation and incorporates self-attention modules in the final layers of the Mamba architecture to improve long-range spatial dependency modeling.
The first hybrid computer vision model combining the advantages of Mamba and Transformer, enhancing visual feature modeling capability by reconstructing the Mamba formula
Contains computer control and automation components for MCP servers
Showcasing the integration of computer vision tools with language models through MCP
MCP-DBLP is a service based on the Model Context Protocol (MCP) that provides large language models with the ability to access the DBLP computer science literature database, including functions such as search, citation processing, and BibTeX export.
A computer vision server implemented based on Ultralytics and the MCP protocol, supporting functions such as object detection, image segmentation, and pose estimation
A TypeScript - based MCP server for file system editing tools, ported from the Anthropic computer usage demonstration.
An OpenAI agent server based on the MCP protocol, providing various professional agents (such as web search, file search, and computer operation) and a multi-agent coordinator, which can interact with clients (such as the Claude desktop application) through the MCP protocol.
An MCP server for seamless integration with computer peripherals, providing a unified API to control, monitor, and manage hardware devices, including cameras, printers, audio devices, and screens.
An MCP server based on computer vision that automatically identifies the positions of image assets and extracts the layout structure by analyzing web page screenshots, supports the detection of multiple layout patterns such as radial and grid, and helps AI assistants accurately reconstruct web page layouts.
A server based on the MCP protocol, providing the function of querying the prices of computer components on the CoolPC website in Taiwan and automatically generating computer configuration quotes.
An MCP server that provides information about the installed applications on a computer, supporting MacOS and Windows systems, and can be integrated with compatible AI assistants.
A privacy - first document search server that runs entirely locally, providing semantic search functions for AI programming tools through the MCP protocol. No API keys or cloud services are required, and all data processing is completed on the user's computer.
The YOLO MCP Service is a powerful computer vision service that integrates with Claude AI through the Model Context Protocol (MCP), providing functions such as object detection, segmentation, classification, and real-time camera analysis.
Computer control and automation components of the MCP server
Desktop Commander MCP is a service that enables the Claude desktop application to execute terminal commands on the user's computer and manage processes through the Model Context Protocol (MCP). It provides terminal command execution, process management, file system operations, and code editing functions, supporting long-running commands and differential file editing.
A TypeScript-based MCP server for interacting with DAOs on the Internet Computer
An MCP server based on nut.js that provides comprehensive control functions for the computer screen, mouse, and keyboard, including screenshot, mouse operation, keyboard input, window management, and clipboard access.
This is an MCP server designed for the Commodore 64 Ultimate (the official modern C64 computer). It allows AI assistants (such as Claude, ChatGPT) to remotely control C64 hardware through a REST API, supporting functions such as program loading, memory operations, and disk management.
An MCP server that provides computer control functions, including mouse and keyboard control, OCR recognition, window management, etc., implemented based on PyAutoGUI and RapidOCR without external dependencies.
An MCP service that allows Claude to control audio playback on the computer
Claude Desktop Commander MCP is a server tool that allows the Claude desktop application to execute terminal commands and manage processes on the user's computer. It is built based on the Model Context Protocol (MCP) and provides file system operations and code editing functions.